{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Problem: Write a Transformer\n",
    "\n",
    "### Problem Statement\n",
    "Implement a **Transformer model** in PyTorch by completing the required sections. The model should consist of an embedding layer, a Transformer encoder, and an output layer for sequence processing and prediction.\n",
    "\n",
    "### Requirements\n",
    "1. **Define the Transformer Model Architecture**:\n",
    "   - **Embedding Layer**:\n",
    "     - Implement a layer to transform input data into a higher-dimensional space.\n",
    "     - Use a `torch.nn.Linear` or `torch.nn.Embedding` layer to create embeddings from the input.\n",
    "   - **Transformer Encoder**:\n",
    "     - Use `torch.nn.TransformerEncoder` or `torch.nn.Transformer` to process sequences with attention.\n",
    "     - Configure parameters such as the number of attention heads and encoder layers.\n",
    "   - **Output Layer**:\n",
    "     - Add a fully connected (linear) layer to reduce the transformer's sequence output into the desired output dimension.\n",
    "\n",
    "2. **Implement the Forward Method**:\n",
    "   - Map the input to the higher-dimensional space using the embedding layer.\n",
    "   - Pass the transformed input through the Transformer encoder.\n",
    "   - Use the output layer to convert the encoded sequence into predictions.\n",
    "\n",
    "### Constraints\n",
    "- Handle input padding correctly for variable-length sequences.\n",
    "- Ensure compatibility with batch processing by correctly shaping input and output tensors.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "import torch\n",
    "import torch.nn as nn\n",
    "import torch.optim as optim"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define a Transformer Model\n",
    "#TODO: Implement a Transformer model\n",
    "class TransformerModel(nn.Module):\n",
    "    def __init__(self, input_dim, embed_dim, num_heads, num_layers, ff_dim, output_dim):\n",
    "        super(TransformerModel, self).__init__()\n",
    "        # Add layers and modules for embedding, transformer, and output\n",
    "\n",
    "    def forward(self, x):\n",
    "        # Define the forward pass logic\n",
    "        ..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Generate synthetic data\n",
    "torch.manual_seed(42)\n",
    "seq_length = 10\n",
    "num_samples = 100\n",
    "input_dim = 1\n",
    "X = torch.rand(num_samples, seq_length, input_dim)  # Random sequences\n",
    "y = torch.sum(X, dim=1)  # Target is the sum of each sequence\n",
    "\n",
    "# Initialize the model, loss function, and optimizer\n",
    "input_dim = 1\n",
    "embed_dim = 16\n",
    "num_heads = 2\n",
    "num_layers = 2\n",
    "ff_dim = 64\n",
    "output_dim = 1\n",
    "\n",
    "model = TransformerModel(input_dim, embed_dim, num_heads, num_layers, ff_dim, output_dim)\n",
    "criterion = nn.MSELoss()\n",
    "optimizer = optim.Adam(model.parameters(), lr=0.001)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch [100/1000], Loss: 1.5771\n",
      "Epoch [200/1000], Loss: 0.8907\n",
      "Epoch [300/1000], Loss: 0.6074\n",
      "Epoch [400/1000], Loss: 0.3587\n",
      "Epoch [500/1000], Loss: 0.1986\n",
      "Epoch [600/1000], Loss: 0.1157\n",
      "Epoch [700/1000], Loss: 0.0762\n",
      "Epoch [800/1000], Loss: 0.0629\n",
      "Epoch [900/1000], Loss: 0.0575\n",
      "Epoch [1000/1000], Loss: 0.0379\n"
     ]
    }
   ],
   "source": [
    "# Training loop\n",
    "epochs = 1000\n",
    "for epoch in range(epochs):\n",
    "    # Forward pass\n",
    "    predictions = model(X)\n",
    "    loss = criterion(predictions, y)\n",
    "\n",
    "    # Backward pass and optimization\n",
    "    optimizer.zero_grad()\n",
    "    loss.backward()\n",
    "    optimizer.step()\n",
    "\n",
    "    # Log progress every 100 epochs\n",
    "    if (epoch + 1) % 100 == 0:\n",
    "        print(f\"Epoch [{epoch + 1}/{epochs}], Loss: {loss.item():.4f}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Predictions for [[[0.6648573279380798], [0.6041934490203857], [0.3187063932418823], [0.9813531041145325], [0.09837877750396729], [0.3223891258239746], [0.3124500513076782], [0.36122316122055054], [0.8705818057060242], [0.4751177430152893]], [[0.569571316242218], [0.05407053232192993], [0.16180634498596191], [0.8140731453895569], [0.34717607498168945], [0.6788632273674011], [0.11463749408721924], [0.21608346700668335], [0.7405895590782166], [0.8521053194999695]]]: [[5.141801834106445], [5.020108699798584]]\n"
     ]
    }
   ],
   "source": [
    "# Testing on new data\n",
    "X_test = torch.rand(2, seq_length, input_dim)\n",
    "with torch.no_grad():\n",
    "    predictions = model(X_test)\n",
    "    print(f\"Predictions for {X_test.tolist()}: {predictions.tolist()}\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
